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DETAILED ACTION 

This office action is in response to the newly submitted amendment filed October 
15, 2008. 

Examiner's Amendment 

An examiner's amendment to the record appears below. Should the changes 
and/or additions be unacceptable to applicant, an amendment may be filed as provided 
by 37 CFR 1 .312. To ensure consideration of such an amendment, it MUST be 
submitted no later than the payment of the issue fee. 

Authorization for this examiner's amendment was given in a telephone interview 
with Attorney John Burbage on October 17, 2008. 

SPECIFICATION AMENDMENT 

Please amend paragraph [0034] as follows: 

In one embodiment, the documents are then compared to create 
clusters of a number of closely related documents. For example, each 
sample document may be compared and then aligned with the closest 
nine documents to create a cluster of ten documents. In one embodiment, 
the dynamic programming alignment algorithm is used to compare the 
documents. The dynamic programming alignment algorithm compares and 
aligns documents to compute relative scores for the compared documents 
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known as edit distances. The edit distance can be a number, roughly 
proportional to the number of insertions and deletions necessary to 
transform one document into another. An alignment of two documents, for 
example, can be a list of those insertions or deletions, or equivalently, a 
mapping from parts of one document to parts of another. Dynamic 
programming alignment is a method understood by those skilled in the art, 
and accordingly need not be described in further detail herein. Furth e r 
deta il s on dynam i c programm i ng may bo found i n Don Gusf i o l d, 
A l gor i thms on Str i ngs, Tr ee s, and S e quenc e s, Comput e r Sc ie nc e and 
Computat i on B i o l ogy (Cambr i dge Un i vers i ty Pross 1007), wh i ch i s 
i ncorporat e d h e r ei n by r e f e r e nc e . 
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CLAIM AMENDMENTS 

1. (Canceled). 

2. (Canceled). 

3. (Canceled). 

4. (Previously Presented) The method of claim 12, further comprising 
determining a cluster of related articles from the related articles. 

5. (Previously Presented) The method of claim 4, wherein determining the 
cluster of related articles is performed by: 

using the dynamic programming alignment algorithm to compute edit 
distances between the seed article and the related articles; and 
choosing the cluster of related articles based on the edit distances. 

6. (Original) The method of claim 4, wherein the identifying at least one 
information field within the seed article is performed by comparing the seed article to the 
cluster of articles. 

7. (Canceled) 
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8. (Previously Presented) The method of claim 12, wherein the articles are web 

pages. 

9. (Original) The method of claim 8, wherein the related articles are web pages 
on a web site. 

10. (Original) The method of claim 9, further comprising simplifying the content 
on a web page. 

1 1 . (Original) The method of claim 1 0, wherein simplifying the content includes 
preserving visible text, visible images, and visible paragraph and table formatting. 

12. (Currently Amended) A method for information extraction, comprising: 
accessing a plurality of related articles; 

determining a seed article from the related articles , the seed article containing 
variable data : 

identifying at least one information field within the seed article by comparing 
the seed article to at least one other related article, the comparison 
comprising using a dynamic programming alignment algorithm to 
determine an alignment between the seed article and the related 
article; 

creating a template based on the identified information field , the template 
identifying an information field in the related articles corresponding to 
the variable data : 
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identifying a plurality of templates each comprising at least one information 
field; 

comparing a source article to the templates to determine a closest template; 
associating data from the source article with an information field from the 

closest template; and 
extracting the associated data. 

13.-14. (Canceled). 

15. (Currently Amended) A method of extracting data from a source article, 
comprising: 

identifying a plurality of templates each comprising at least one information 
field corresponding to variable data in articles ; 

comparing the source article to the templates to determine a closest template, 
wherein comparing the source article to the templates is performed by 
a dynamic programming alignment algorithm to compute an edit 
distance between the source article and the templates; 

associating data from the source article with an information field 
corresponding to variable data from the closest template; 

extracting the associated data; and 

displaying the associated data. 



16.-18. (Canceled) 
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19. (Previously Presented) The computer program product of claim 23, wherein 
comparing the seed article to at least one other related article is performed by using the 
dynamic programming alignment algorithm to determine an alignment between the seed 
article and the related article. 

20. (Previously Presented) The computer program product of claim 23, further 
comprising computer program code for determining a cluster of related articles from the 
related articles. 

21 . (Previously Presented) The computer program product of claim 20, wherein 
determining a cluster of related articles is performed by: 

using the dynamic programming alignment algorithm to compute edit 
distances between the seed article and the related articles; and 
choosing the cluster of related articles based on the edit distances. 

22. (Previously Presented) The computer program product of claim 20, wherein 
the identifying at least one information field within the seed article is performed by 
comparing the seed article to the cluster of related articles. 

23. (Currently Amended) A computer program product having a tangible 
computer-readable storage medium having comput e r e x e cutab le processor-executable 
code encoded thereon for performing information extraction when executed by a 
processor , the computer oxocutab l o processor-executable code comprising code for: 

accessing a plurality of related articles; 
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determining a seed article from the related articles , the seed article containing 
variable data : 

identifying at least one information field within the seed article by comparing 
the seed article to at least one other related article; 

creating a template based on the identified information field , the template 
identifying an information field in the related articles corresponding to 
the variable data : 

identifying a plurality of templates each comprising at least one information 
field; 

comparing a source article to the templates to determine a closest template, 
the comparison comprising using a dynamic programming alignment 
algorithm to compute an edit distance between the source article and 
the templates; 

associating data from the source article with an information field from the 

closest template; and 
extracting the associated data. 

24. (Canceled). 



25. (Previously Presented) The computer program product of claim 23, further 
comprising code for: 

displaying the associated data. 
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26. (Previously Presented) The computer program product of claim 23, further 
comprising code for: 

storing the associated data. 

Based on Applicant's arguments, Examiner formally withdraws the objection to 
the specification. 

Allowable Subject Matter 

Claims 4-6, 8-12, 1 5, 1 9-23, and 25-26 are allowed over the prior art made of 

record. 

The following is an examiner's statement of reasons for allowance: 

Regarding Independent Claims 12,15 and 23, under the broadest reasonable 
interpretation of the claimed limitation consistence with the Applicant's Specification, the 
prior art cited in the record fails to teach all of the Applicant's claimed limitation. In 
particularly, the claimed invention advantageously provides a finer level of detail that 
enables for determining a seed article containing variable data from the related 
articles, identifying at an information field within the seed article by comparing the seed 
article to the other related article, using a dynamic programming alignment algorithm to 
determine an alignment between the seed article and the related article, creating a 
template based on the identified information field corresponding to the variable data, 
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identifying templates, and comparing a source article to the templates to determine a 
closest template. 

Thus, prior art of record neither render obvious nor anticipates the combination of 
claimed elements in light of the specification. 

Dependent claims 4-6, 8-1 1 , 19-22, and 25-26 are allowed at least by virtue of 
their dependencies from their pertinent independent claims. 

After a further search and a thorough examination of the present application and 
in light of the prior art made of record, claims 4-6, 8-1 2,15,1 9-23, and 25-26 are 
allowed. 

Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement of Reasons for Allowance." 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Diane D. Mizrahi whose telephone number is 571-272- 
4079. The examiner can normally be reached on Monday-Thursday (9:30 - 4:30 p.m.). 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Christian Chase can be reached on (571) 272-4190. The fax phone 
numbers for the organization where this application or proceeding is assigned are (703) 
872-9306 for regular communications and (703) 305-3900 for After Final 
communication. 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the receptionist whose telephone number is (571) 272- 
2100. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through Private PAIR only. 

For more information about the PAIR system, see http://pair-direct.uspto.qov. 

Should you have questions on access to the Private PAIR system, contact the 

Electronic Business Center (EBC) at 866-217-9197 (toll free). 

/Diane Mizrahi/ 

Diane.Mizrahi@USPTO.gov 
Primary Patent Examiner 
Technology Center 2100 
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